Shallow Parsing by Inferencing with Classifiers

نویسندگان

  • Vasin Punyakanok
  • Dan Roth
چکیده

We s tudy the problem of identifying phrase structure. We formalize it as the problem of combining the outcomes of several different classifiers in a way that provides a coherent inference that satisfies some constraints, and develop two general approaches for it. The first is a Markovian approach that extends standard HMMs to allow the use of a rich observations s tructure and of general classifiers to model state-observation dependencies. The second is an extension of constraint satisfaction formalisms. We also develop efficient algorithms under bo th models and s tudy them experimentally in the context of shallow parsing. 1 1 I d e n t i f y i n g P h r a s e S t r u c t u r e The problem of identifying phrase structure can be formalized as follows. Given an input string O = < ol, 02 , . . . , On >, a phrase is a substring of consecutive input symbols oi, o i+l , . . . ,o j . Some external mechanism is assumed to consistently (or stochastically) annotate substrings as phrases 2. Our goal is to come up with a mechanism that, given an input string, identifies the phrases in this string, this is a fundamental task with applications in natural language (Church, 1988; Ramshaw and Marcus, 1995; Mufioz et al., 1999; Cardie and Pierce, 1998). The identification mechanism works by using classifiers that process the input string and recognize in the input string local signals which * This research is supported by NSF grants IIS-9801638, SBR-9873450 and IIS-9984168. 1Full version is in (Punyakanok and Roth, 2000). 2We assume here a single type of phrase, and thus each input symbol is either in a phrase or outside it. All the methods we discuss can be extended to deal with several kinds of phrases in a string, including different kinds of phrases and embedded phrases. are indicative to the existence of a phrase. Local signals can indicate that an input symbol o is inside or outside a phrase (IO modeling) or they can indicate that an input symbol o opens or closes a phrase (the OC modeling) or some combination of the two. In any case, the local signals can be combined to determine the phrases in the input string. This process, however, needs to satisfy some constraints for the resulting set of phrases to be legitimate. Several types of constraints, such as length and order can be formalized and incorporated into the mechanisms studied here. For simplicity, we focus only on the most basic and common constraint we assume that phrases do not overlap. The goal is thus two-fold: to learn classifiers that recognize the local signals and to combine these in a ways that respects the constraints. 2 M a r k o v M o d e l i n g HMM is a probabilistic finite state automaton used to model the probabilistic generation of sequential processes. The model consists of a finite set S of states, a set (9 of observations, an initial s tate distr ibution P1 (s), a statetransition distr ibut ion P(s[s') for s, # E S and an observation distr ibution P(o[s) for o E (9 and s 6 S. 3 In a supervised learning task, an observation sequence O -< o l , o 2 , . . . On > is supervised by a corresponding state sequence S = < sl , s2,. • • sn >. The supervision can also be supplied, as described in Sec. 1, using the local signals. Constraints can be incorporated into the HMM by constraining the state transition probability distr ibution P(s]s'). For example, set P ( s V ) = 0 for all s, s' such that the transition from s ~ to s is not allowed. aSee (Rabiner, 1989) for a comprehensive tutorial.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Ontological Relations of Korean Numeral Classifiers from Semi-structured Resources Using NLP Techniques

Many studies have focused on the facts that numeral classifiers give decisive clues to the semantic categorizing of nouns. However, few studies have analyzed the ontological relationships of classifiers or the construction of classifier ontology. In this paper, a semi-automatic method of extracting and representing the various ontological relations of Korean numeral classifiers is proposed. Sha...

متن کامل

Shallow Parsing with Conditional Random Fields

Conditional random fields for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Among sequence labeling tasks in language processing, shallow parsing has received much attention, with the development of standard evaluation datasets and extensive comparison among methods. We show here how to train a conditional random fiel...

متن کامل

The Use of Classifiers in Sequential Inference

We study the problem of combining the outcomes of several different classifiers in a way that provides a coherent inference that satisfies some constraints. In particular, we develop two general approaches for an important subproblem identifying phrase structure. The first is a Markovian approach that extends standard HMMs to allow the use of a rich observation structure and of general classifi...

متن کامل

Chunk Parsing Revisited

Chunk parsing is conceptually appealing but its performance has not been satisfactory for practical use. In this paper we show that chunk parsing can perform significantly better than previously reported by using a simple slidingwindow method and maximum entropy classifiers for phrase recognition in each level of chunking. Experimental results with the Penn Treebank corpus show that our chunk p...

متن کامل

Shallow Semantic Parsing using Support Vector Machines

In this paper, we propose a machine learning algorithm for shallow semantic parsing, extending the work of Gildea and Jurafsky (2002), Surdeanu et al. (2003) and others. Our algorithm is based on Support Vector Machines which we show give an improvement in performance over earlier classifiers. We show performance improvements through a number of new features and measure their ability to general...

متن کامل

Shalmaneser - A Toolchain For Shallow Semantic Parsing

This paper presents SHALMANESER, a software package for shallow semantic parsing, the automatic assignment of semantic classes and roles to free text. SHALMANESER is a toolchain of independent modules communicating through a common XML format. System output can be inspected graphically. SHALMANESER can be used either as a “black box” to obtain semantic parses for new datasets (classifiers for E...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000